42 research outputs found

    A Comparison of Four Approaches to Discretization Based on Entropy †

    Get PDF
    We compare four discretization methods, all based on entropy: the original C4.5 approach to discretization, two globalized methods, known as equal interval width and equal frequency per interval, and a relatively new method for discretization called multiple scanning using the C4.5 decision tree generation system. The main objective of our research is to compare the quality of these four methods using two criteria: an error rate evaluated by ten-fold cross-validation and the size of the decision tree generated by C4.5. Our results show that multiple scanning is the best discretization method in terms of the error rate and that decision trees generated from datasets discretized by multiple scanning are simpler than decision trees generated directly by C4.5 or generated from datasets discretized by both globalized discretization methods

    The usefulness of a machine learning approach to knowledge acquisition

    Get PDF
    This paper presents results of experiments showing how machine learning methods are useful for rule induction in the process of knowledge acquisition for expert systems. Four machine learning methods were used: ID3, ID3 with dropping conditions, and two options of the system LERS (Learning from Examples based on Rough Sets): LEM1 and LEM2. Two knowledge acquisition options of LERS were used as well. All six methods were used for rule induction from six real-life data sets. The main objective was to test how an expert system, supplied with these rule sets, performs without information on a few attributes. Thus an expert system attempts to classify examples with all missing values of some attributes. As a result of experiments, it is clear that all machine learning methods performed much worse than knowledge acquisition options of LERS. Thus, machine learning methods used for knowledge acquisition should be replaced by other methods of rule induction that will generate complete sets of rules. Knowledge acquisition options of LERS are examples of such appropriate ways of inducing rules for building knowledge bases

    Global discretization of continuous attributes as preprocessing for machine learning

    Get PDF
    AbstractReal-life data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require a small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discretization restricted to single continuous attributes will be called local, while methods that simultaneously convert all continuous attributes will be called global. In this paper, a method of transforming any local discretization method into a global one is presented. A global discretization method, based on cluster analysis, is presented and compared experimentally with three known local methods, transformed into global. Experiments include tenfold cross-validation and leaving-one-out methods for ten real-life data sets

    Partition triples: A tool for reduction of data sets

    Get PDF
    Data sets discussed in this paper are presented as tables with rows corresponding to examples (entities, objects) and columns to attributes. A partition triple is defined for such a table as a triple of partitions on the set of examples, the set of attributes, and the set of attribute values, respectively, preserving the structure of a table. The idea of a partition triple is an extension of the idea of a partition pair, introduced by J. Hartmanis and J. Steams in automata theory. Results characterizing partition triples and algorithms for computing partition triples are presented. The theory is illustrated by an example of an application in machine learning from examples. (C) 1996 Academic Press, Inc

    Partition triples: A tool for reduction of data sets

    Get PDF
    Data sets discussed in this paper are presented as tables with rows corresponding to examples (entities, objects) and columns to attributes. A partition triple is defined for such a table as a triple of partitions on the set of examples, the set of attributes, and the set of attribute values, respectively, preserving the structure of a table. The idea of a partition triple is an extension of the idea of a partition pair, introduced by J. Hartmanis and J. Steams in automata theory. Results characterizing partition triples and algorithms for computing partition triples are presented. The theory is illustrated by an example of an application in machine learning from examples. (C) 1996 Academic Press, Inc

    Reduced Data Sets and Entropy-Based Discretization

    Get PDF
    This work is licensed under a Creative Commons Attribution 4.0 International License.Results of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized data set and two data sets, based, respectively, on left and right reducts, we applied ten-fold cross validation using the C4.5 decision tree generation system. Our main objective was to compare the quality of all three types of data sets in terms of an error rate. Additionally, we compared complexity of generated decision trees. We show that reduction of data sets may only increase the error rate and that the decision trees generated from reduced decision sets are not simpler than the decision trees generated from non-reduced data sets

    Improving prediction of preterm birth using a new classification scheme and rule induction

    Get PDF
    Prediction of preterm birth is a poorly understood domain. The existing manual methods of assessment of preterm birth are 17% - 38% accurate. The machine learning system LERS was used for three different datasets about pregnant women. Rules induced by LERS were used in conjunction with a classification scheme of LERS, based on ''bucket brigade algorithm'' of genetic algorithms and enhanced by partial matching. The resulting prediction of preterm birth in new, unseen cases is much more accurate (68%-90%)

    Global discretization of continuous attributes as preprocessing for machine learning

    Get PDF
    Real-life data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require a small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discretization restricted to single continuous attributes will be called local, while methods that simultaneously convert all continuous attributes will be called global. in this paper, a method of transforming any local discretization method into a global one is presented. A global discretization method, based on cluster analysis is presented and compared experimentally with three known local methods, transformed into global. Experiments include tenfold cross-validation and leaving-one-out methods for ten real-life data sets

    Entropy of English text: Experiments with humans and a machine learning system based on rough sets

    Get PDF
    The goal of this paper is to show the dependency of the entropy of English text on the subject of the experiment, the type of English text, and the methodology used to estimate the entropy. Claude Shannon first described the technique for estimating the entropy of English text by a human subject guessing the next letter after viewing a string of characters taken from actual text. We show how this result is affected by using different humans in the experiment (Shannon used only his wife) and by using different types of text material (Shannon used only a single book). We also show how the results are affected when we replace the human subjects with a machine learning system based on rough sets. Automating the play of the guessing game with this system, called LERS, gives rise to a lossless data compression scheme. (C) Elsevier Science Inc. 1998

    Operation-preserving functions and autonomous factors of finite automata

    Get PDF
    The relationship between the structure of autonomous finite automata and their operation-preserving functions is considered. The results imply some ideas in the study of operation-preserving functions of arbitrary finite automata, because with each finite automaton the set of its autonomous factors is associated. Basing on the method of the investigation of operation-preserving functions of finite automaton A and by studying autonomous factors of A, the algorithm for determining operation-preserving functions of A is given
    corecore